Text Analysis of Biden and Trump Speeches During the 2020 Presidential Election

Introduction

The United States presidential election is one of the most followed political events in the world. As such, there are many who study the data involved in the hopes of both making predictions and informing the public on the current state of the election. In this blog post, we analyze data from a key component of the election process: speeches given by the candidates. In particular, we analyze text from speeches given by Joe Biden and Donald Trump during the lead up to the 2020 election. Our primary questions we hoped to answer were:

  1. What are the most common words and phrases used by Trump and Biden?
  2. What are the relationships between those words/phrases?
  3. How did the frequency of these words/phrases change over time?

We answer these questions through a series of visualizations which display results acquired via various techniques in text analysis.

Data

Visualizations

In order to address our three posed questions, we created three types of visualizations, one for each question. To identify the most frequent words used in their speeches, we created wordclouds with fontsize corresponding to word frequency. To identify relationships between the words, we created network graphs with edge sizes corresponding to “closeness” of these words within the documents. (We will define “closeness” in the network section). Lastly, we created line graphs to identify changes in word frequencies over time.

Word Frequency Wordclouds

The word cloud is used to highlight the frequently used words in both Donald Trump and Joe Biden’s speeches. We made used of stop words to remove words that are frequently used but provide little information. Some common English stop words include “I”, “she’ll”, “the”, etc. We created a vector to add our own stop words into the built in stopwords dataframe. By analyzing the most frequent words used by the two candidates we can get a better insight into the main issues the two candidates hope to solve or their main policies. For example, a frequent word used by Joe Biden is covid because one of Joe Biden’s main campaign policies was the eradication of the virus in the US and China was one of Trump’s frequently used words because China is the US’s foreign trade rival.

Network Visualizations

To understand the relationships between speech words, we looked at two types of words: the most common words across all speeches and popular election topics such as climate change, health care, and COVID-19. For each of these sets, we needed to define some metric for “closeness”. To do this, we emulated an analysis of Game of Thrones. Specifically, we defined the closeness of two words (within the full dataset) as the number of times that the words occur within d words of each other in a single speech, where d is a parameter specifying this word distance. For the first set of words (the most common words across all speeches), we chose d to be 10 (a relatively low value) since the words considered are more generic in nature (than the other set of words), and therefore a lower choice of d will pick up on more significant relationships. On the other hand, we chose d to be 50 for the popular election topics. Since the specific words involved in these topics occur much less frequently than the most common words, we needed to choose a larger value of d to capture these relationships.

Text Mining

Talk about Python here (and include non-runnable chunk)

Network Analysis for Most Commonly Used Words

Speech Analysis Over Time

Limitations, Pitfalls, and Future Research

While we feel like we adequately answered our specific questions, it’s definitely hard to make inference outside of that with larger questions, like how their different speech patterns contributed to the 2020 election. One interesting idea for future research is to analyze sentiment in their text, to determine the tone of language used and how that could shift over time.